A Fuzzy Clustering Approach for Missing Value Imputation with Non-Parameter Outlier Test

نویسندگان

  • Jing Tian
  • Bing Yu
  • Dan Yu
  • Shilong Ma
چکیده

Missing value is a challenging issue in data mining, as information deficiency negatively affects both data quality and reliability. This paper focuses on an algorithm of a fuzzy clustering approach for missing value imputation with noisy data immunity. The PCFKMI (Pre-Clustering based Fuzzy K-Means Imputation) method aggregates data instances to more accurate clusters for further appropriate estimation via information entropy after resampling pre-clustering and outlier test. Experimental results demonstrate that the PCFKMI proposed obtains higher precision both on quantitative and on nominal attributive missing value completion than other classic methods under all missingness mechanisms at varying missing rates with abnormal values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

A classifier ensemble approach for the missing feature problem

OBJECTIVES Many classification problems must deal with data that contains missing values. In such cases data imputation is critical. This paper evaluates the performance of several statistical and machine learning imputation methods, including our novel multiple imputation ensemble approach, using different datasets. MATERIALS AND METHODS Several state-of-the-art approaches are compared using...

متن کامل

Density-based Imputation Method for Fuzzy Cluster Analysis of Gene Expression Microarray Data

Fuzzy clustering has been widely used for analysis of gene expression microarray data. However, most fuzzy clustering algorithms require complete datasets and, because of technical limitations, most microarray datasets have missing values. To address this problem, we present a new algorithm where genes are clustered using the Fuzzy C-Means algorithm (FCM). The fuzzy partition obtained is then u...

متن کامل

Machine Learning Based Missing Value Imputation Method for Clinical Dataset

Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore t...

متن کامل

Fuzzy Unordered Rules Induction Algorithm Used as Missing Value Imputation Methods for K-Mean Clustering on Real Cardiovascular Data

Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012